Original Goal: To search environmental temperature data for intermediate timescales of temperature change.

Original thoughts for a Plan of action: To create a for loop that will:

  1. Take an already tidy data set and filter for times of the year where daily variation is >x (user designated range)

  2. For all days that have that amount of vairability, calculate average of preceeding 2 weeks

  3. calculate average of following 1, 2, 3, 4, 5, 6, 7, 8, 9, ……n days

  4. if avg is >preceeding 2 weeks by Y amt, designate it as “use”, if not, designate as “toss”

  5. filter for usable days after dates with high variability

  6. Generate pretty plot of the days

Idea from Logan: Sliding Window: Plot the derivative of the data– to pull out different levels of variability
Idea from Krista Auto-regresive models, and thinking about the freshwater ecology candidate –
–> ESM Lab 5

What I actually as able to do – actually just graph the data (mean, min, max) and then look at daily variability throughout the year (daily fluctuations in temp) to start exploring times of year with different amounts of variability

Tidying Data

## Parsed with column specification:
## cols(
##   year = col_double(),
##   month = col_double(),
##   day = col_double(),
##   hour = col_double(),
##   wspeed = col_double(),
##   wdir = col_double(),
##   water_temp = col_double(),
##   wvht = col_double(),
##   wvdpd = col_double(),
##   wvapd = col_double(),
##   buoy = col_character()
## )

Preliminary Plot of Data

# First pas plot to look at SST across years
ggplot(sst, aes(x = water_temp, y = as.factor(month)))+
  geom_density_ridges()+ # From the ggridges package
  ylab("Year") + 
  xlab("Daily Fluctations (°C)")+
  theme_bw()

Making nicer plots of 1) Daily Temperature and 2) Daily Variation

How to deal with missing data? Want to discuss this more in the group. Sam had sent around this resource from Allison Horst:

https://allisonhorst.shinyapps.io/missingexplorer/#section-introduction

There are quite a lot of missing years in the data set. The naniar package can tell me the proportion of missing data relative to the entire data set

## # A tibble: 17 x 3
##    variable   n_miss pct_miss
##    <chr>       <int>    <dbl>
##  1 wvdpd       85669     20.1
##  2 wvht        85575     20.1
##  3 wvapd       85571     20.1
##  4 water_temp  53309     12.5
##  5 wdir        53264     12.5
##  6 wspeed      52180     12.2
##  7 year            0      0  
##  8 month           0      0  
##  9 day             0      0  
## 10 hour            0      0  
## 11 buoy            0      0  
## 12 year1           0      0  
## 13 month1          0      0  
## 14 day1            0      0  
## 15 hour1           0      0  
## 16 date_time       0      0  
## 17 date            0      0

There is a lot of missing data for entire years, and I am not sure how to deal with this, especially since most of the missing data is in the more recent years.

Questions for group: 1) How to graph variability? Do you like the idea of doing max daily var across the year? 2) What to do when the recent data is missing? I.e. if you had an experiment in 2020, but there are large holes in the data after 2016?

To make a daily variability plot, I will just take the latest full year of data (2016) and make a nicer version of that plot

I’m interested in how variability from 1994-1996 compares to 2015-2017. Going to filter the data for a few years early in the data set to visualize how to “smoothed” data compare to the more recent years. Not feeling super confident in this approach, but thought it could be cool to look at.

Trying to make a for loop (and failing)

Parking Lot